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Abstract 

It is well known that the out-of-sample performance of Markowitz’s 
mean-variance portfolio criterion can be negatively affected by estima¬ 
tion errors in the mean and covariance. In this paper we address the 
problem by regularizing the mean-variance objective function with a 
weighted elastic net penalty. We show that the use of this penalty can 
be motivated by a robust reformulation of the mean-variance criterion 
that directly accounts for parameter uncertainty. With this interpre¬ 
tation of the weighted elastic net penalty we derive data driven tech¬ 
niques for calibrating the weighting parameters based on the level of 
uncertainty in the parameter estimates. We test our proposed tech¬ 
nique on US stock return data and our results show that the calibrated 
weighted elastic net penalized portfolio outperforms both the unpenal¬ 
ized portfolio and uniformly weighted elastic net penalized portfolio. 

This paper also introduces a novel Adaptive Support Split-Bregman 
approach which leverages the sparse nature of i\ penalized portfolios 
to efficiently compute a solution of our proposed portfolio criterion. 
Numerical results show that this modification to the Split-Bregman 
algorithm results in significant improvements in computational speed 
compared with other techniques. 
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1 Introduction 

The birth of modern portfolio theory occurred in 1952 with the seminal 
publication of Harry Markowitz’s criterion [29] for optimal single period 
portfolio construction that balances a portfolio’s risk with return potential. 
A key assumption in modern portfolio theory is that given two portfolios 
with the same expected return an investor will always choose the portfolio 
with minimal risk. Markowitz proposed using the variance of portfolio’s 
return as the measure of the portfolio risk. Thus Markowitz formulated the 
portfolio selection problem as minimizing portfolio return variance subject 
to a minimum expected value of return. Mathematically the Markowitz 
formulation can be written as a quadratic programming problem and the 
optimal portfolio can be computed using a variety of quadratic programming 
methods mm- 

One shortcoming of the Markowitz criterion for portfolio optimization is 
that it requires the practitioner to specify the expected return of each asset 
and the covariance of the returns of different assets. This presents a problem 
to an investor because the future mean and covariance matrix are not known. 
If incorrect parameter values are used then the portfolio performance will 
be sub-optimal mm- This additional risk due to parameter uncertainty is 
commonly referred to as estimation risk. 

An intuitive technique that can be utilized when the mean and covariance 
are unknown is to estimate the mean and covariance matrix from historical 
return data m and to plug-in the estimated parameters in place of the 
truth. One approach to estimating the unknown parameters is to use sample 
averaging which is maximum likelihood (ML) optimal when the returns are 
i.i.d and normally distributed. This approach can be very accurate when 
the data is normally distributed and sufficient training data is available. For 
data that is not normally distributed robust estimation techniques for the 
covariance matrix can be considered [35l[5]. 

Although the sample average and plug-in approach is intuitive, there are 
difficulties in effectively implementing it. The primary difficulty is that there 
is often a limited amount of relevant historical financial return data available 
to estimate the mean and covariance. One reason for the lack of relevant data 
is that the investments’ return statistics can be time-varying. Thus only a 
limited amount of past data is relevant in estimating the current mean and 
covariance. Since the volatility of assets returns can be large, the parameter 
estimates obtained from averaging only a small number of samples can be 
large. Further complicating the problem is that the covariance matrix can 
be ill-conditioned. This makes the portfolio weights extremely sensitive to 
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small parameter errors. The effect of these estimation errors is risk return 
performance that departs significantly from the optimal performance under 
known statistics mmm- 

As an alternative to sample average estimates, Bayesian estimators for 
both mean and covariance have been proposed |16L [23l. 25J. These estimators 
effectively “shrink” the sample average estimates towards a more structured 
estimate (via a convex combination) which takes into account prior knowl¬ 
edge. Prior knowledge can take the form of structured data models such 
as a single factor model [3l] or the Farna- French three-factor model [12] . 
Shrinking the sample average estimates towards the more structured model 
reduces the variability in the parameter estimates and can improve out-of- 
sample portfolio performance. 

Another approach that has been shown to improve out-of-sample perfor¬ 
mance involves regularizing the portfolio selection criterion by adding penal¬ 
ties to the objective function 0 anna m such as portfolio norm penalties. 
In [7J I\ and squared I 2 norm constraints are proposed for the minimum 
variance criterion and a cross-validation procedure is suggested to calibrate 
the constraints. In [39] [38j an elastic net penalty is proposed in the con¬ 
text of constrained minimum variance portfolio optimization. The authors 
also derive a method to calibrate the elastic net penalty which is designed 
to ensure that the variance of the resulting portfolio will not exceed the 
unpenalized portfolio variance (asymptotically). In [15] the authors study 
convex penalties such as a weighted LASSO approach [20j and a non-convex 
SCAD penalty [13] with application to minimum-variance portfolios. For 
the weighted LASSO approach the authors propose a calibration scheme 
where the weights are selected according to the variability in the volatility 
of each asset. 

The above norm constrained and penalty approaches for portfolio op¬ 
timization primarily focused on the minimum variance approach. Conse¬ 
quently the calibration and justification of the norm penalties above were 
derived by considering only the portfolio variance. The mean return is ig¬ 
nored in the calibration of the penalty. In this paper we propose a method 
which can be applied to the mean-variance criterion where both portfolio 
mean and variance are considered. In this setting we propose regularizing 
the objective function with a weighted elastic net penalty. A weighted elas¬ 
tic net penalty is a linear combination of a portfolio’s weighted t\ norm 
and the square of a portfolio’s weighted I 2 norm. We show that the use 
of the weighted elastic net penalty can be justified by reformulating the 
mean-variance criterion as a robust optimization problem mm where the 
mean and volatilities of the asset returns belong to a known uncertainty 
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set. With this robust optimization interpretation, data driven techniques 
for calibrating the weight parameters in the weighted elastic net penalty are 
derived. 

Our proposed penalized criterion which is equivalent to a special case 
of a robust optimization problem has two advantages over the general ro¬ 
bust portfolio optimization problem. First our method can be solved using 
fast and well established algorithms for l\ penalized optimization problems 
such as the Split-Bregman algorithm m and the FISTA algorithm |2]. In 
the more general case, solving the robust portfolio optimization problem re¬ 
quires using semi-definite programming techniques m which are intractable 
for large portfolios. Finally, our formulation of the problem results in sparse 
portfolios which can contribute to reducing portfolio turnover and transac¬ 
tion costs. The general robust optimization problem does not necessarily 
result in a sparse portfolio. 

This paper also addresses computational aspects of computing weighted 
elastic net penalized portfolios. In particular, we propose a novel Adaptive 
Support Split-Bregman approach to computing weighted elastic-net penal¬ 
ized portfolios. This new algorithm exploits the sparse nature of elastic net 
penalized solutions to minimize computational requirements. We show that 
this results in significant improvements in convergence speed versus other 
solvers. 

The remainder of this paper’s body is organized as follows: Section 2 
introduces the weighted elastic net penalty and provides a justification for its 
use. In Section 3 we discuss the Adaptive Support Split-Bregman approach 
for computing the optimal portfolio. Section 4 presents experimental results 
using US equity data that demonstrate the benefit of our proposed approach. 
Finally in Section 5 we state our conclusions and a path forward for future 
work. The appendix contains proofs of some technical results presented in 
Section 3. 

2 Portfolio Selection Criteria 

In this section we first review the mean-variance portfolio selection crite¬ 
rion. We then present the weighted elastic-net penalized portfolio selection 
criterion and motivate its use. 

2.1 Mean-Variance Portfolio selection criterion 

Suppose that there exists a set of N risky assets and let {s n (k)}^ = i be the 
prices of each asset at time k. Then the excess return of the n th asset for 
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time period k is defined as 

r n (k) = 8 "( fc + 1 )-*"( fc ) _ r H) {k) (1) 

Sn{k) 

where A F \k) is the return of a risk-free asset at time k. We model 
as random variables with finite mean and covariance. A portfolio is defined 
to be a set of weights {w n }n=i c R- If Wi > 0 a long position has been taken 
in the i th asset whereas Wi < 0 indicates a short position. 

The mean-variance criterion proposed by Markowitz [29] addresses single 
period portfolio selection. A portfolio of risky assets, w is mean-variance 
optimal if it is a solution to the following optimization problem 

min ipw Tw-p w (2) 

W 

where T and p are the covariance and mean of r for the time period of 
interest and where ip > 0 is a risk aversion coefficient (since ip will only affect 
the portfolio weights up to a positive scalar multiple we shall set <p = 1) . 
Assuming that T is symmetric and positive definite we have that (]2]) is a 
convex quadratic program whose solution, w* , satisfies 

Tre* = p (3) 

Estimation of parameters is necessary to implement the mean-variance 
criteria. It has been recognized that estimation of mean return is more 
difficult than covariance m and thus a minimum variance criterion is often 
advocated for in recent literature mmm- In the minimum variance 
criterion the mean of asset returns are ignored and the following criterion is 
used for portfolio selection 

min w Ttc 

W 

N 

S.t. Y, w i = 1- ( 4 ) 

i= 1 

Despite ignoring all information on the mean return, the minimum variance 
criterion often outperforms the mean-variance criterion when judged by out- 
of-sample Sharpe ratio mm- 

2.2 Norm Penalized Portfolio optimization 

As was stated in the introduction mean-variance portfolio optimization is 
sensitive to parameter estimation error. To address these concerns a number 
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of norm penalized criterions have been proposed, primarily in the context 
of minimum variance optimization. Commonly used convex norm penalties 
include the I\ norm , squared (.2 norm and elastic net penalties [39]. The l\ 
and squared I 2 norm penalties are given as 


N 

A^ktl 

(5) 

2=1 


N 


A E^ 2 

2=1 

(6) 


respectively where A > 0 is a weighting factor. The elastic net penalty is a 
weighted sum of the I\ and squared I 2 norm penalties 

N N 

AiX>*I + A 2 £^ 2 (7) 

2=1 2=1 

where Ai, A 2 > 0. Another convex penalty is the adaptive LASSO penalty m 
which was applied to minimum variance optimization in m ■ The adaptive 
LASSO penalty is a weighted i\ norm given by 

N 

IMIm = E AKI ( 8 ) 

k= 1 

where (3k > 0. Calibration of the weighting parameters for the above penal¬ 
ties has primarily been studied with the goal of improving the portfolio 
return variance [39] [T5] . 

Several justifications for using and squared I 2 norms as penalties and 
constraints have been given in the literature. For example in |3j it is stated 
that the use of an uniformly weighted t\ penalty can be motivated by the 
desire to obtain sparse portfolios and to regularize the mean-variance prob¬ 
lem when the covariance is ill-conditioned. In m the authors show that 

estimation risk in the mean-variance setting due to errors in the mean return 

estimation is bounded above by 

lli“-£l|oo|Mk ( 9 ) 

and use that upper bound as a rationale for promoting small ||u;||^ 1 . In [26] 
it is mentioned that a benefit of using a uniformly weighted I 2 norm penalty 
is to stabilize the inverse covariance matrix which is often ill-conditioned in 
financial applications. 
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Non-convex penalized minimum-variance portfolio criterions were stud¬ 
ied in m- One such penalty examined in m is the Softly Clipped Absolute 
Deviation (SCAD) penalty [13]. The SCAD penalty is defined as follows 

N 

( 10 ) 

i-1 


where 


p\(x) = 


A|x| 

x 2 -2a S cApM x \+N 


2 (ascAD- 
OsCAD + 1)A 2 

2 


1) 


if \x\ < A 

if A < |x| < ascAD A 
if \x\ > asc AD A 


( 11 ) 


and where ascAD > 2. This penalty is similar to the I\ penalty and was 
initially proposed in context of variable selection. Calibration of the pa¬ 
rameters ascAD and A in (flU for portfolio optimization has not been fully 
addressed in the literature. 


2.3 Weighted Elastic Net Penalized Portfolio 


The preceding norm penalties are derived and calibrated primarily from a 
minimum variance perspective. In this section we extend the above methods 
for minimum variance portfolio design to mean-variance portfolios. Here we 
propose augmenting the mean-variance criterion with the sum of a weighted 
l\ and the square of a weighted 1 2 penalty. The penalty terms in the new 
portfolio selection criterion will be referred to as a weighted elastic net which 
was studied in the context of variable selection in [JTj. 

Let and {/3*}^ 1 be positive real numbers. Then the weighted 

elastic net penalty for a portfolio w is 


where 


and 


! + IMI a ,£2 

(12) 

N 


= E Pk\wk\ 

(13) 

k= 1 


N 


-- E a k \w k | 2 . 

k= 1 

(14) 


Thus the weighted elastic net penalized mean-variance criterion may be 
written as 


min 

W 


w 1 Tw ■ 


T ~ 1 

■ UJ fi+ | 


\W\ 


M 


+ 10 


2 

I a ,£2 


( 15 ) 


where T and ft are estimates of T and p respectively. 
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2.4 Motivation 

A rationale for augmenting the mean-variance criterion with a weighted elas¬ 
tic net penalty can be obtained by reformulating the mean-variance criterion 
as a robust optimization problem. As was stated in the introduction it is 
well-known that the out-of-sample performance of mean-variance portfolio 
can degrade significantly when there are errors in the estimate of mean and 
covariance. The risk due to estimation errors can be reduced by accounting 
for them in the optimization criterion. 

One way to model the parameter estimation risk is to assume the true 
covariance and mean belong to the following uncertainty sets 

A — ■ Ri } j = k i ,j + CiJ , — A ? ;j, A! — d} 

B = {v : Vi = fii + Cf,\ci\ < /3i} 

where the matrix A is symmetric and diagonally dominant with j > 0 for 
all i,j- This condition on A ensures that a matrix, R, of the form 

Jf M + A M if i = j 
O ± A i j if i # j 

Rj,i 

is positive semi-definite (i.e. Red). 

Since the mean and covariance are unknown, a conservative approach 
to selecting a portfolio is to optimize the worse case performance over the 
uncertainty sets. This can be written as the following robust optimization 
problem m 

min max w T Rw-v T w. (16) 

W RtA,V£B 

Note that for a fixed R and v this problem is convex in w. Since the pointwise 
maximum of a family of convex functions remains convex we have that 

max w T Rw-v T w (17) 

RtA,vtB 

is convex in w. Performing the inner maximization with respect to /r reduces 
the problem to 


Ri,j - 
Ri,j = 


N 

T Rw + Y, + Pi s S n ( w i)) Wi 
2=1 


mm max w 
W ReA 


( 18 ) 
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where 


sgn (wi) 



if Wi 4- 0 
else. 


This can be re-written as 


minmax tv(Rww T )-w T a + \\w\\se , 

w R<=A v r " "pW 


and the inner maximization with respect to R can be solved in closed form. 
Performing this final maximization gives us the following convex optimiza¬ 
tion problem 

minu; T ru; - w T fi + A|+ llrclls « (19) 

w "Pw 1 

where the iV x 1 vector w is defined as 


W\i = | Wi 


( 20 ) 


Thus we see that problem (1161) is equivalent to augmenting the mean- 
variance criterion with a weighted pairwise elastic net penalty [28]. 

When A equals the diagonal matrix D a where 


'a\ 0 ... 0 ' 

o ■•. ■•. : 

: ••. ••. o 

^ 0 ... 0 ct_/v> 


( 21 ) 


the criterion simplifies to the weighted elastic net penalized problem defined 
in problem (1151) 

min w t Tw - w T fi + IMI /3 4 + |MleU 2 (22) 

where = A,This observation is summarized in the following theorem: 

Theorem 1 The weighted elastic net penalized problem in (USD is equivalent 
to the robust optimization problem in m, when A = D a . 


2.5 Data Driven Calibration of Weighting Parameters 

We now address the problem of selecting the weighting parameters a and 
/3. Recall that Theorem [T| states that problems (fT5l) and (fT6l) are equivalent. 
This implies that a and /3 represent the level of uncertainty in the mean and 
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variance of each asset. Thus we propose setting a and (3 to be commensurate 
with the amount of error in our parameter estimates. 

Since the amount of error in the parameter estimates are unknown, we 
need to estimate them. One approach to estimate the amount of error is the 
bootstrap method m- Bootstrapping is a non-parametric approach that 
has been applied to portfolio optimization [32] and calibration of robust 
portfolio optimization problems [36]. One advantage of bootstrapping is 
that it does not require specification of a distribution of the return data. 

The first step of bootstrapping is to measure T tra i n time samples of past 
training data to estimate /i* and Ty, using estimators f IH and fr. t , respec¬ 
tively. Common choices for f )H and fy,, are sample averages or shrinkage 
estimators. Once the parameter estimates are obtained, the training data 
is resampled with replacement and additional estimates of pi and T,;^ are 
formed using the resampled data. The resampling can be described by inde¬ 
pendent uniformly distributed integer valued random variables, Vk,m-, taking 
values between 1 and T tra in■ Here k e {1 and m e {1 ,..., T tra in} ■ 

Under the condition that the estimators / w and /r, ; are invariant to the or¬ 
dering of the training data, the bootstrap estimates of the estimation errors 
may be defined as 


/^z,err(^) ~ \f (^(^,1)5 • • 

■ Hi(vk,T train )) - Ail 

r i,err(k) = ( r i( v k,l) 5 • • 

■ ,n(vk,T train )) - fi,i 


respectively. Here rj(t) is the return of the i th asset in the t th training sam¬ 
ple. The percentiles of the empirical distributions of {T i )err (k)\k = 1 ... K} 
and {ni,err(k)\k = 1 .. . K} can then be referenced to derive and fy. For 
example, suppose 0 < pi,p 2 < 1 - Then the values for ai and /3i can be defined 
as 

ctj = min{x : \{n : Tj jerr (n) < x}\< p\K} (23) 

and 

f3i = min {x : \{n : m, e rr(n) < x}\ < p 2 K} (24) 

where K is the number of bootstrap estimates. 

An economic interpretation of the percentile parameters pi and p 2 is that 
of model estimation risk aversion factors. Here p± represents the aversion to 
squared volatility estimation risk and p 2 is the aversion to mean estimation 
risk. A percentile value of 0 corresponds to no aversion to estimation risk 
whereas a value of 1 corresponds to a high aversion to estimation risk. Note 
that a higher aversion to estimation risk will increase the weights in the 
elastic net. 
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3 Numerical methods 

In this section we review some numerical algorithms for determining solu¬ 
tions of csd. First we review an application of the Split-Bregman algo¬ 
rithm [18] for solving (USD- Then we propose a novel Adaptive Support 
Split-Bregman approach which solves (1151) faster than the Split-Bregman 
algorithm by exploiting the sparse nature of the portfolio weights. 


3.1 Optimality and Approximate Optimality Conditions 


In this section we derive approximate optimality conditions for (1151) . These 
conditions are then used to a design a numerical algorithm for determining 
the solution of (fl5l) . 

Let ^(w) denote the objective function for the weighted elastic net port¬ 
folio problem in equation (fTSD 


'L(u?) 


w T fw - w T fi + |H|+ |MliU 
T T 11 11 

w Rw - w A + IMI/l e 1 


(25) 


where R-T + D a . Since T is convex, w* minimizes T if and only if 


0 eW(w‘) (26) 

where dTH) is the sub-gradient of T evaluated at w [3]. Note that since R 
is positive definite, T is strictly convex and thus there is a unique solution 
to (H5D. 

In most cases we are only interested in portfolios that are approximately 
optimal. Thus we can relax our optimality conditions to derive a stopping 
criterion for any iterative solver of (USD. Before introducing our relaxed 
conditions we define the support of a portfolio w as 


supp(u;) = {i : H| > 0} 


and define the smallest variance uncertainty as 

a 0 = min{aj : 0 < i < N}. 


(27) 


With the above definitions we have the following theorem which establishes 
an approximate optimality condition. 


Theorem 2 Let w* be the solution of (fT5l) . Suppose that w satisfies 


Y' (~^—(w T RW - W T fl + | HI | 3 n ) ) 


< 2ea„ 


(28) 
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and 


d , t 

Pi < n —( w Rw ' 
OWi 


w 


A) 


for all i i supp(u>). Then 


(29) 


I’(w) < ^(w*) + e 


(30) 


Proof 1 See Appendix. 


In a numerical algorithm it may happen that none of the portfolio weights 
are exactly 0, although they may be extremely close to zero. Thus the above 
theorem may not be very practical for use as a stopping criterion. For this 
reason let us separate the small portfolio weights from the larger portfolio 
weights. To do this we define 

supp e («;) = {i € supp(u>) : \wi\ < e} . 


With this definition we have the following corollary which suggests a more 
practical stopping rule than Theorem [2j 

Theorem 3 Let M > 2||i2||^ 2 and let e > 0 be given. Choose ij < Let 

w* be the solution the of (fT5j) . Suppose that w satisfies 


£ 

ie supp(tp)\supp 7 ? (t<;) 



T 

w Rw 


T - 

■w fl + \ 


w\ 


'PA 



< 2 ea 0 


(31) 


and 


d 

- (3i + e< —— (w T Rw - w T jj) < Pi - e 

OWi w=w 


(32) 


for i e supp „(w) u supp(u)). Then 


T(C)<^(w^)+ ^ +1 ) 2 e 


( 33 ) 


where 


{ 0 if i € supp v {w) 
Wi else . 


Proof 2 See Appendix. 
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3.2 Split-Bregman Algorithm 

The weighted elastic net problem can be reformulated as a quadratic pro¬ 
gram and solved using general purpose solvers. However the reformulation 
involves adding an additional N primal variables as well as 2 N dual vari¬ 
ables. Thus this approach may not be applicable to large scale problems. 

An algorithm better suited to handle problems like m is the Split- 
Bregman algorithm. The Split-Bregman algorithm was introduced in [T 8 J 
for problems involving I\ regularization such as (USD. When using the Split- 
Bregman method to solve (|15D we solve an equivalent problem 

min w T Rw - w T fi + WdW^ 

w,d 

s.t. d = (34) 

where R = pT+D a and where ^(w) = (/3itci,... ,(3nwn )• The Split-Bregman 
algorithm applied to ( 1 M 1 ) is 


Algorithm 1 Split Bregman Algorithm for solving (13411 

Initialize: k = 1, b k = 0, w k = 0, d k = 0 
while ||u; fc - w k ~ l ||^ 2 > tol do 

w k+1 = argmin w w T Rw - w T p + ||| d k - 'ip(w) - b k ||| 
d k+1 = argmin d ||| d- i^{w k+l ) - b k \\j 2 + \\d\\ h 
b k+1 = b k + (3iW k+1 - d k+1 
k = k + 1 
end while 


Both inner optimization problems in Algorithm [T] have closed form so¬ 
lutions. The first problem is an unconstrained strictly convex quadratic 
program and the second problem can be solved using the shrinkage operator 

d k+1 = shrink{(5jW k+l +b k ,—) 

A 

where 

X 

shrmk(x,'y ) = — -max(|x| - 7 , 0 ). 

\ x \ 

The stopping criterion in Algorithm [T] does not ensure that the objective 
value is within a desired tolerance. A modification to the algorithm can be 
made to ensure that this occurs. One such modification uses Theorem [3] to 
derive a stopping criterion. 
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Algorithm 2 Modified Split Bregman Algorithm for solving (1341) 


Initialize: k = 0, b k ,w k ,d k = \w k \,tol > 0 

while w k does not satisfy conditions of Theorem [3] for e = 

w = w k do 

w k+1 = argmin,,, w T Rw - w T jx + ||| d k - ip(w) - b k \\J 
d k+1 = argmin d § \\d- i’(w k+1 ) + b k \\\ + \\d\\ h 
b k+1 = b k + piW k+1 - d k+1 
k = k + 1 


2 

(\/2+l) 2 


tol and 


end while 

Output C and d k where C is defined as in Theorem [3] using e = 3 tol 

(v2+l)“ 

and w = w k . 


By Theorem [3] this algorithm ensures that the objective value is within 
tol of the optimal value. 

3.3 Adaptive Support Split Bregman 

The first sub-problem in Algorithms |T] and [2] involves solving a N x N sys¬ 
tem of equations. When the number of assets is large completing this step 
becomes computational expensive. This is especially true for financial data 
where the covariance matrix is ill-conditioned and dense. Thus Algorithms |Tj 
and[2]may be impractical in applications where real-time results are required 
or computational performance is limited. 

It is well known [4] that portfolio optimization problems with an I\ 
regularization term can result in sparse portfolios i.e. the solution of (1151) is 
only non-zero in a small number of indices. Figure |T] illustrates this behavior 
by showing the portfolio weights for 1600 assets obtained using the criterion 
in M For this example less than 11% of the assets have a non-zero weight. 

Sparsity of the portfolio weights can be exploited to reduce computa¬ 
tional complexity. To see this suppose w* solves (fl5l) and / = supp(uF) is 
known a priori (before computing the solution). Then the problem (1151) can 
be relaxed to the equivalent problem 

min w R\iw-w A|/ + IMI/j 4 

where R\j and represent the covariance and mean restricted to /. This 
problem is of dimension |/| and requires fewer operations to compute per 
iteration. This suggests that an Adaptive Support Split-Bregman Algorithm 
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Portfolio Weights with Elastic Net Penalty 



Figure 1: Elastic net penalty promotes sparsity in the portfolio weights 


which attempts to solve (jl5l) on smaller subspaces, I, where supp(w;*) c / 
can save computational time. 

To develop an effective algorithm we first derive an optimality condition 
which can be used as a stopping criterion. 

Lemma 4 w* solves (USD if and only if |(2 Rw*)i - fn\ < [3i for all i 
supp^*) and (2 Rw*)i - fii + /3jSgn(u;*) = 0 for all i e supp(re*). 


Proof 3 Suppose w* solves (USD and let i e supp^*). Then since w* is 
optimal and w* 4- 0 the partial derivative of the objective function with respect 
to Wi exists and is equal to 0. Thus 


0 


d 

dwi 


(.'U]')\w=w* 


2 (Rw*)i - fii + ^sgn(wf). 


Now suppose i £ supp(u>*). Now the partial derivative of the objective func¬ 
tion does not exist. However by optimality we have 


0 e 
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Thus 

lim +hS,)-T(w*) 
hi 0 h 

and 

, ^(w* + h5i) - 'L(ud) 

lim —^------ < 0 

MO h 

which imply 

(2 Rw*)i - fn> -fa 

and 

(2 Rw*)i - [m < fl t . 

For the converse suppose that \{2Rw*)i -/b| < /3j for all i supple*) and 
(2 Rw*)i - fii + f3iSgn(w*) = 0 for all i e supp(u>*). Choose e = min{|u;j| : i e 
supp(u))}. Then for any w such that ||u; - w*\\oo < e 

^(w) - 'P(uP) > Y ((2Rw*)i - fii + PiSgn(wi)) (w t -w*) + 

iesupp(iu*) 

+ Y ((2Rw*)i-fri)wi +Pi\wi\ 

ifisupp(w*) 

> 0 . 


Thus w* is locally optimal which implies global optimality. 

Lemma H] can be used to derive a criterion for determining which indices 
in a portfolio, x, belong in the support. For example, suppose that i j 
supp(x), and |(2 Rx)i - /tj| > ff. Then the objective function in (fl5l) can be 
reduced by adding i into supp(x). Thus x is not optimal and we should 
incorporate i into supp(x). 

Next we look at how to prolongate the Split Bregman variables (w, d, b ) 
from a lower dimensional space to a higher dimensional space. Prolongation 
of w and d can be achieved through simple zero filling. Prolongation of b is 
more delicate. The following Lemma suggests an effective prolongation. 

Lemma 5 Suppose (w* ,d* ) is the solution of (|34l) obtained with Algorithm 
\7\ Then 

lim b k = -(2 Rw* - A)*/(ftA). 

k—>oo 

Proof 4 By Algorithm |7] we have for all k 

2(Rw k+1 )i -fa- A (d k - if(w k+1 ) - b k )iPi = 0. 


( 35 ) 




WEIGHTED ELASTIC NET PENALIZED PORTFOLIOS 


17 


Since lirn^oo w k = w* and lim^oo d k = d* and d* = ip(w *) we have 
lim2(Rw k+1 ) i -ii i + \(b k ) i p i = 0 


which implies that 


lim (b k )i = 


fii - 2{Rw*)j 

A:A 


This suggests that the prolongation of b can be defined from equation 
(l35l) . For example suppose ( w,d,b ) solves (fTil) on a restricted domain / c 
{1,2,... N} and let w and d represent the prolongation of w and d to a set 
J d I i.e. 


Wj = \ 

3 [o 

if .7 e I 
if jeJ-I 

(36) 

"tp o 

II 

"dP 

if j e I 
if je J-I ■ 

(37) 


Then taking a cue from equation (|35l) the prolongation of b may be defined 
as 

bi = (-2R\jw + ji\j)i/((3i\). (38) 

The Adaptive Support Split Bregman Algorithm for solving (134|) is given 
below. 


Algorithm 3 Adaptive Support Split Bregman Algorithm for solving [34] 

Initialize: k = 0, w° = 0, d° = 0,6° = 0, e > 0, M > 0 
Define D° = 2Rw° - fi 

while \D k \ > /3i for any i supp(tc fc ) AND fc<JVdo 
Define the set J k = {D k : i f. supp(tc fc )} 

Set K = M v (k + 1 - |supp(u; fc )|) 

Set J k equal to the largest K elements in J k 
Set I k = J k usupp(u> fc ) 

Run Algorithm [2] on I k with initialization w k Jk , b k Ik , d k k and tolerance 
e 

Set (w k+1 , d k+1 ) to the prolongation of output of previous step 
Set b k+1 = -2(Rw k+1 - A)i/(AA), 

Set D k+1 = 2Rw k+1 - fi 
k = k + 1 

end while 


The next theorem shows that Algorithm [3] converges. 
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Theorem 6 Let w* be the optimal solution to (1151) and let w' be a solution 
produced by Algorithms for e = tol. Then 

^{w') < 4 '(w*) + tol. (39) 

Proof 5 By design the algorithm terminates after at most N iterations. 
Suppose the algorithm terminates ink < N iterations. Let I^ be the support 
in iteration k of the Adaptive Support Split-Bregman algorithm. Then by 
the proof of Theorem 0 w' satisfies the conditions of Theorem S with e = 
tol. Thus by Theorems'll {w') < ^{w*) +tol. Now suppose the algorithm 
terminates in N iterations. Since I^ N ~^ contains all asset indices it follows 
by the design of Algorithms that \E r ( w') < ^(w*) +tol. 

To evaluate the execution speed of Adaptive Support Split-Bregman al¬ 
gorithm we performed a comparison with the following fast algorithms de¬ 
scribed in the literature: Split-Bregman algorithm (Algorithm [2] ), FISTA 
[2] and Multilevel Iterated-Shrinkage [37j. To the best of our knowledge 
these algorithms are considered state of the art for large-scale ^i-penalized 
quadratic programs. For the multi-level algorithm proposed in [37] we use 
the FISTA |2] algorithm for all relaxations and lowest level solvers. To 
make a fair comparison we have used the same error tolerance of 10 -6 for 
each algorithm. 

Tables [T] and [2] presents MATLAB run times for solving (115(1 for a large 
and small basket of US stocks. The machine running the simulation has the 
Windows 7 operating system and an Intel i7-3740 processor with 32.0 GB 
of RAM. 


Table 1: Adaptive Support Split-Bregman converges quickly to a solution 
for sparse portfolios 


Dimension 

Sparsity 

Level 

Adaptive Support 
Split-Bregman 

Split-Bregman 

FISTA 

Multi-level 
FISTA [37] 

2000 

88 

0.1 sec 

20.6 sec 

0.4 sec 

0.2 sec 

2000 

142 

0.2 sec 

14.5 sec 

0.8 sec 

0.2 sec 

2000 

450 

0.9 sec 

14.6 sec 

3.6 sec 

1.5 sec 

2000 

853 

4.8 sec 

23.0 sec 

8.8 sec 

9.2 sec 

2000 

1692 

10.4 sec 

38.0 sec 

21.4 sec 

22.7 sec 

3000 

237 

0.3 sec 

48.2 sec 

12.9 sec 

2.7 sec 

3000 

805 

1.3 sec 

49.9 sec 

55.7 sec 

24.6 sec 

4000 

234 

0.5 sec 

107.6 sec 

24.6 sec 

2.2 sec 


In Table |T] we see that the Adaptive Support Split-Bregman Algorithm 
converges much faster than both Split-Bregman, FISTA and Multi-Level 
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FISTA for sparse portfolios taken from a large set of assets. On the other 
hand Tables |T] and [2] show that the advantage of the Adaptive Support Split 
Bregman algorithm decreases when the cardinality of the asset set is small 
or when the support of the portfolio is large. 


Table 2: Benefit of Adaptive Support Split-Bregman decreases when dimen¬ 
sionality is small 


Dimension 

Sparsity 

Level 

Adaptive Support 
Split-Bregman 

Split-Bregman 

FISTA 

Multi-level 
FISTA [37] 

500 

53 

0.03 sec 

0.8 sec 

0.02 sec 

0.02 sec 

500 

150 

0.09 sec 

0.6 sec 

0.04 sec 

0.03 sec 

500 

261 

0.2 sec 

0.5 sec 

0.2 sec 

0.2 sec 


4 Experimental Results 

In this section we quantify the performance benefit of using a weighted 
elastic net penalty by testing our criterion in ()15l) on daily return data from 
630 U.S. stocks collected between January 1, 2001 and July 1, 2014 with 
market capitalization greater than 4 billion US dollars. The results are then 
compared with other portfolio selection criteria described in Section [2] and 
the naive equal-weighted portfolio. 

In our experiments we compute new portfolios every 63 trading days 
using daily returns from the prior 252 trading days as training data for 
parameter estimation and calibration of the elastic net weights. Our criteria 
for evaluating the portfolio performance is the out-of-sample Sharpe ratio of 
the daily portfolio returns. Sharpe ratio is defined as the portfolio’s excess 
return divided by its standard deviation. The formula used for computing 
the Sharpe ratio is given below 

SR= _ \H T i=iw{ti) T r(tj) _ 

\J\ £I=i {w(ti) T r(ti) -1 (Ej=i w{tj) T r{tj))f 

where r is the total number of trading days in our 13.5 year data set. Here 
w(ti) is the portfolio on day U, which is computed from the previous set of 
training data and remains fixed over intervals of 63 trading days. 

4.1 Parameter Estimation and Calibration 

Due to the large number of assets and small amount of training data, esti¬ 
mation of the covariance and mean in our experiments is performed using 
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shrinkage techniques |10| . We estimate the covariance matrix using the tech¬ 
nique described in [2$\ . In that paper the following shrinkage estimator for 
T is proposed 

f 1 = PiXs + P 2 I (41) 

where is the sample average covariance obtained from the training data 
and where PhP 2 are > 0. In our experiments we use the optimal values of 
pi > 0 and p^ > 0 which are derived in [24]. Note that this choice of shrinkage 
target guarantees that T will be positive definite. 

Since the weighted elastic net penalty consists of a squared weighted I 2 
norm, the shrinkage in (1411) may appear to be redundant when applied with 
the weighted elastic net regularization in Section 12.31 However, this is not 
the case since the weights on the weighted elastic net and the shrinkage 
parameters in (ED are adaptively selected according to different criteria. 
Thus the covariance shrinkage target becomes a combination of the boot¬ 
strap derived target and the target derived according to [21]. One benefit 
of this approach is that there will always be some level of I 2 regularization 
regardless of what the bootstrap criterion derives. 

For estimation of the mean we employ a James-Stein estimator mm 
which was proposed for portfolio optimization in [23] . When applying the 
James-Stein approach we compute the estimate of p using the equation 

£= (! -p)ps + pvl- (42) 


Here ps is the sample mean vector and i] is the maximum of average of 
the sample means and the daily historical return of the US stock market 
between 1928 and 2000 [6] 


0.0004. (43) 


The value of p is set according to [23] as 


p = min 


1 , 


Htrain 


(N- 2 ) _ 

(fj-s -rjl) 


(44) 


The weights for the weighted elastic net penalty are calibrated using the 
bootstrap technique described in Section 12.51 with identical estimation risk 
aversion factors for mean and squared volatility i.e. pi = p 2 - Calibration of 
the weighted LASSO penalty is performed using the technique described in 
m- Since the weighted LASSO calibration in m is only defined up to a 
constant we perform a parametric study for various constants. Calibration 
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of the elastic net penalty is handled using the technique described in section 

1.6.2 of [38]. The calibration method in [38] only determines the sum Ai + A 2 
in fl7|), the relative weighting of Ai and A 2 is not addressed. Thus we perform 
a parametric analysis over the relative weighting between the parameters Ai 
and A 2 in the elastic net. For SCAD there are no known calibration methods. 
Hence for SCAD we perform a parametric study for various A values and a 
fixed a sc AD parameter of 3.7 as suggested in H31- 

4.2 Sharpe Ratio performance 

In this section we present performance results for the following 5 mean- 
variance criteria: 1) unpenalized 2) weighted elastic net penalized, 3) weighted 
LASSO penalized [15], 4) elastic net penalized [39], and 5) SCAD penalized. 
As a comparison case we also tested the 1/N equal weighted portfolio. 


Sharpe Ratio 



Figure 2: Weighted Elastic Net performance as a function of the estimation 
risk aversion factor 

In Figure [2] we examine the Sharpe ratios of the weighted elastic net 
penalty as a function of estimation risk aversion factor, i.e. bootstrap per- 
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centile. As a comparison the performance of the 1/N and unpenalized port¬ 
folio are also shown. The figure demonstrates that the weighted elastic net 
penalized criterion and bootstrap calibration improves Sharpe ratio per¬ 
formance over the 1/N and unpenalized portfolio when the estimation risk 
aversion factor is between 0.5 and 0.95. Outside of this interval the weighted 
elastic net penalty did not improve performance, which suggests that a mod¬ 
erate amount of estimation risk aversion is optimal. 



Estimation Risk Aversion Factor 


Figure 3: Quartiles for a weights as a function of the estimation risk aversion 
factor 

In Figures [3] and [I] we present the quartiles of the a and f3 parameters 
obtained from our bootstrap technique as a function of the estimation risk 
aversion factors. The values increase sharply when moving from an aversion 
factor of 0.95 to 1.0. This may explain the dramatic loss in performance 
from 0.95 to 1.0 in Figure [2j 

For comparison purposes the Sharpe ratio of the weighted LASSO, elas¬ 
tic net and SCAD penalized portfolios are shown in Figures [5l El and [7] as 
a function of their respective penalty scaling parameter. We see that both 
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Estimation Risk Aversion Factor 


Figure 4: Quartiles for /3 weights as a function of the estimation risk aversion 
factor 

weighted LASSO and the elastic net do not perform as well as the weighted 
elastic net penalty. This could be a consequence of their calibration be¬ 
ing derived from a minimum variance perspective. The SCAD penalized 
portfolio performs comparable to the weighted elastic net penalty if the A 
parameter is chosen correctly. However, it is still an open question on how 
to automate the selection of an optimal A in the SCAD penalty for portfolio 
optimization problems. 

5 Conclusions and Generalizations 

In this paper the addition of a weighted elastic net penalty to mean-variance 
objective function has been proposed in order to improve out-of-sample port¬ 
folio performance when parameter estimates are uncertain. We have shown 
that this approach can be motivated by reformulating the mean-variance 
criterion as a robust optimization problem. With this view we develop 
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Sharpe Ratio of Weighted LASSO penalized mean-variance criterion 



Penalty Parameter 

Figure 5: Weighted LASSO performance as a function of penalty normaliza¬ 
tion factor. Calibration of relative weight values performed using technique 
in [15] . 


a data-driven criterion for calibration of the elastic net weights based on 
bootstrapping and an investor’s aversion to model estimation risk. To com¬ 
pute the portfolio weights efficiently we proposed a novel Adaptive Support 
Split-Bregman algorithm for solving our proposed optimization criterion. 
This technique exploits the sparsity promoting properties of the weighted 
elastic net penalty to reduce computational requirements. 

Our experimental results demonstrate that using the weighted elastic net 
penalty and calibration approach can result in higher out-of-sample Sharpe 
ratio than the other norm penalization techniques designed for minimum 
variance portfolios. In addition, our MATLAB run-time results indicate 
that the proposed Adaptive Support Split-Bregman algorithm significantly 
reduces computation time compared with other algorithms such as Split- 
Bregman and FISTA. 

An interesting question raised by this paper is whether the more gen- 
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Sharpe Ratio of Elastic Net penalized mean-variance criterion 



Figure 6: Elastic Net performance as a function of difference of the i\ and 
squared I 2 weights. Calibration performed using the technique in (3Hj 


eral pairwise elastic net penalty in (11911 will provide further performance 
enhancement than the weighted elastic net penalty. The pairwise penalty 
appears promising since it is derived from a more flexible model where un¬ 
certainty in the off-diagonal of T is allowed. However the pairwise elastic 
net requires specification of up to ^ more uncertainty parameters than 
the weighted elastic net. In addition numerical algorithms for computing so¬ 
lutions to (1191) have not been extensively reported on in the literature. We 
plan to investigate these questions in future work. 
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Sharpe Ratio of SCAD penalized mean-variance criterion 



Figure 7: SCAD performance as a function of A parameter. ascAD param¬ 
eter fixed to 3.7 

A Proofs of Theorems [2] and |3] 

In this section we provide proofs for Theorems [2] and [3j To facilitate the 
proof we will first reformulate the criterion in (1151) as a quadratic program. 

A.l Quadratic Program Reformulation 

Problem (jl5[) can be reformulated as a quadratic program with linear in¬ 
equality constraints by introducing an auxiliary variable d, 

min Q(w,d) (45) 

w,d 

s.t. - di < Wi 
-di< -Wi 
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where &(w, d) = w T Rw - w T fi + T,iLiPidi and where R = f + D a . The 
Lagrangian for this problem 

N N N 

L(w,d, A) = w T Rw-uF fj, + Y J fdidi + Y J ^i{-di-Wi) + Y J h+N{-di+w i ) (46) 

2 = 1 2=1 2=1 

plays an important role in our subsequent analysis in the next section. 


A.2 Approximate Optimality Proofs 

Here we prove Theorems [2] and [3] using the quadratic program reformulation 
(I45p . Our first task is to derive a lower bound on the Lagrangian for a fixed 
A and when d = |tc|. First note that R is symmetric positive definite whose 
smallest eigenvalue is > a Q where 


a Q = min {cti '■ 1 <i < N} . 
Thus for di = \wi\,di = \wi\ and A > 0 we have 


3>(w,d) 


> L(w,d,X ) 

= L(w,d, A) + V w L(w,d, X) T (w - w) 

+'\7dL(w, d, A ) T (d - d) + (w - w) T H w (w, d, X)(w - w) 

> L(w , d. A) + V w L(w, d, X) T (iv -ui) + VdL(w, d, X) T (d - d) 

+a Q \\w 

> L(w, d, A) + S7 w L(w, d, A ) T (w - w) + VdL(w, d, X) T (d - d) 

+^a 0 \\w - w\\ 2 h + -a 0 \\d - d\\j 2 (47) 


where H w is the Hessian of L w.r.t to the w variables. 

We now present two lemmas which will be useful in deriving a stopping 
criterion. Our first lemma gives an upper bound for L when the gradient of 
L is small. 


Lemma 7 Suppose di = \w t \ for all i and \\V w> dL(w, d, A)11^ 2 < \J2ea 0 . Then 
L(w,d,X ) < <&(w*,d*) + e where w* solves (fl5]l and d* = \vj*\ for all i. 


Proof 6 By equation (jjTJ) we have 

$(w*,d*)>L(w*,d*,X) > L(w,d, X) + S7 w L(w,d) T (w* - w) + VdL(w,d) T (d* - d) 

+ \<Xo\\w* -w\\h + \°to\\d* -d\\%. 
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The righthand side is minimized by substituting S7dL(w, d, A) in for (d* - 
d) and --^-V w L(w,d, A) in for ( w* -w). With these substitutions we obtain 

$(w*,d*) > L(w,d,X)--^—\\V wd L(w,d,X)\\j 

2 a Q 

> L(w, d, A) - e. 

The next lemma can be verified easily. 

Lemma 8 Suppose |a[ < b. Then there exist X\,X 2 > 0 such that 

X\ + X2 = b 

-x\ + x '2 = a. 


A.2.1 Proof of Theorem [2] 


We are now ready to prove Theorem [2] which establishes a condition for 
approximate optimality of a portfolio under the weighted elastic net criterion 

(USD- 


Proof 7 of Theorem [2] 

Choose d* and d such that d* = |u;*| and di = \wi\. For i e supp(u)) define A 
such that 


0 if Wi > 0, i € supp(u)) 
(3i if Wi < 0, i € supp(tc) 


and for i e supp (w),define Xi+N = Pi - A,; . 

For i i supp(u)) we want to define A i and Xi+N such that Xi > 0, Xi+N > 0, 


A i + Xi+N - Pi 


(48) 


and 


Q / T T \ 

- Xi + Xi+N = -w —(w Rw-w fi) 

OWi 


By Lemma 0 equation PUD implies that such a Xi,Xi+N exists. 

Let us form the Lagrangian L(w,d,X ) as in equation (Hdl) . 
i e supp(rc) 


(49) 


Then for 


() 

dwi 


L(w , d, A) 


d 

\(w,d) = -Qfff. 


( T 

w Rw 


T - I 
■ W fl + \ 


\w\ 


PH 


) 


and 


d 

—L(w,d,X) M = 0. 
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For i i supp(u)) we have by equation (1491) 

JLL(w,d,\) l{a>d) = 0 

and by equation (1481) 

d 

ddk L (w,d, A) |M =0. 

It then follows from equation (l28j) that 

\\V w ,dL(w,d,X)\\£ 2 < \/2ea 0 

and so by Lemma [7] and our choice of A we have that 

&(w,d) = L(w,d ,\) 

< $(w*,d*) + e. 


This clearly implies that 

w ) < + e. 


A.2.2 Proof of Theorem [3] 

Now we prove Theorem [3] which can be used to establish a more practical 
convergence criterion than Theorem [2} 


Proof 8 of Theorem [3] 

By construction ||£ - wW^ < ||C _ w\\h - ^ follows that 


E 

lesupp(C) 


d 


( dw t 


( T 

w Rw 


T - 

w ji + \ 


w\ 


'PA 


) ) <(n/ 2 + 1 ) 2 a 0 e 

w=C/ 


and 


-Pi < -^—{w t Rw - w T f) < Pi 

OWi w=C 


for all i £ supp(C)- So by Theorem d we have that C satisfies (1331) . 
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